-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regular expression parsing #3244
Conversation
f3b0770
to
56a35df
Compare
@@ -372,12 +372,13 @@ private function parseGroups(string $regex): ?array | |||
{ | |||
if (self::$parser === null) { | |||
/** @throws void */ | |||
self::$parser = Llk::load(new Read('hoa://Library/Regex/Grammar.pp')); | |||
self::$parser = Llk::load(new Read(__DIR__ . '/../../../resources/RegexGrammar.pp')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ondrejmirtes Are we ok loading the file from here instead? Not sure what this does in terms of licensing, nor if there's a better place to put it. If so I'll delete the patch file and drop the hoa/regex requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this path will work in the packaged phpstan.phar - while I agree that having the RegexGrammar.pp
in the phpstan-src repo might ease the contribution experience - as the patching is really cumbersome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree moving from the patching as the patching is very hard to understand and the changes needed are more and more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I got rid of it, we'll see what @ondrejmirtes says :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to add the whole and unpatched RegexGrammar.pp
as first/separate commit, then the patched/actual grammar and then the changes from this PR. This will hugely improve the review and git history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done 👍🏻
This pull request has been marked as ready for review. |
@staabm please review the last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks really good for me.
only open question (which Ondrej has to decide on): what todo with the RegexGrammar.pp
.
|
||
|
||
// Skip. | ||
%skip nl \n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\n
can be in pcre class as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am unsure why this is here, I haven't experimented much, but it is rare to have actual newlines in regexes as they're usually escaped as \n, while this here matches a real newline.
I'll create a new issue for this to look at it later, together with proper support for the x modifier, because it's very much related. Edit: phpstan/phpstan#11360
- Skip regex parsing if the pattern is not valid, but report errors then if parsing fails - Fix support for internal options to include more options, and fix []] (i.e. class end delimiter as first character is not a literal) not parsing correctly - Add a couple missing quantifiers in getQuantificationRange - Handle more literal values and fix escaped ones to be unescaped
I cherry-picked the grammar in the repo directly to 1.11.x, rebased this PR, and merged it. Thanks! :) |
I moved the grammar file into the repo instead of the hoa regex one with patch, that makes it a lot easier to work with.
This PR fixes parsing of complex character classes which was really not well supported in hoa/regex, along with a bunch of other smaller issues.